{ "cells": [ { "cell_type": "markdown", "id": "bf77e2af-f181-4470-859d-bfccfe625514", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Loading Feature Information" ] }, { "cell_type": "raw", "id": "ab49671d-72aa-4a07-ad02-316f219f88ba", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ ".. currentmodule:: massdash" ] }, { "cell_type": "raw", "id": "5f4c5344-f344-41a3-ab29-5d29e584fd82", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Regardless of whether a :py:class:`~loaders.ChromatogramLoader` or :py:class:`~loaders.SpectrumLoader` is used, MassDash can read information from the `OpenSwath` or `DIA-NN` output on the features found for a particular peptide. " ] }, { "cell_type": "markdown", "id": "ee86ae1c-dfed-452b-b7e9-a1d4bd657600", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "As a demonstration, we will compare the features found for the peptide `AFVDFLSDEIK` with a charge state of 2" ] }, { "cell_type": "code", "execution_count": 1, "id": "0472071d-141e-4858-9037-0f1b6962a77d", "metadata": { "editable": true, "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "f15a7806-3adb-41be-8d81-5e1eda910d71", "metadata": { "editable": true, "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# Please run this before executing any cell\n", "import os\n", "os.chdir(\"../../test/test_data/\") #### Insert path to data, this is the path to the tutorial data. " ] }, { "cell_type": "markdown", "id": "41187449-e751-44b2-830b-91212c622ac1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Loading Transition Group Features" ] }, { "cell_type": "raw", "id": "34254c3e-c91d-4f01-a709-a5541e8cc1c3", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "For this example, lets initialize an :py:class:`~.loaders.MzMLDataLoader` with both `DIA-NN` outputs and `OpenSWATH` outputs so that we can compare their features. Note we can initialize multiple results files in the same loader by supplying a list of all the paths to the results files." ] }, { "cell_type": "code", "execution_count": 3, "id": "6436fa7e-b8be-4325-a3b2-905e9b58b22f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Initializing valid scores for selection\n", "[2024-10-10 09:15:32,714] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.08444595336914062 ms\n", "[2024-10-10 09:15:32,715] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.\n", "[2024-10-10 09:15:32,715] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.\n" ] } ], "source": [ "from massdash.loaders import MzMLDataLoader\n", "loader = MzMLDataLoader(dataFiles=\"mzml/ionMobilityTest.mzML\",\n", " rsltsFile=[\"osw/ionMobilityTest.osw\", \"diann/ionMobilityTest-diannReport.tsv\"])" ] }, { "cell_type": "raw", "id": "8c0ab2d9-5a92-474b-aff8-beefad7285a6", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "MassDash uses :py:class:`~structs.TransitionGroupFeature` information to store metadata on the feature detected. At minimum this contains the retention time boundaries detected by the software tool but other information as described below may also be found." ] }, { "cell_type": "raw", "id": "64a32de0-f1f7-4d5d-bc07-914fa824c4b2", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ ".. autoclass:: massdash.structs.TransitionGroupFeature\n", " :noindex:" ] }, { "cell_type": "raw", "id": "0fbb12a9-03d8-48be-a7c0-f9ec407bdd4e", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "To load the top :py:class:`~structs.TransitionGroupFeature` we can use the :py:func:`~loaders.GenericLoader.loadTopTransitionGroupFeature` method. " ] }, { "cell_type": "code", "execution_count": 5, "id": "292b3c81-4f9b-4d72-a36b-caed53f614ae", "metadata": { "editable": true, "scrolled": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "TransitionGroupFeatureCollection\n", "ionMobilityTest: [-------- TransitionGroupFeature --------\n", "leftBoundary: 6235.8486328125\n", "rightBoundary: 6248.42822265625\n", "areaIntensity: 352642.16135025\n", "consensusApex: 6242.15\n", "consensusApexIntensity: 352642.16135025\n", "qvalue: 3.5084067486223456e-05\n", "consensusApexIM: 0.978579389257473\n", "precursor_mz: None\n", "precursor_charge: 2\n", "product_annotations: None\n", "product_mz: None\n", "sequence: AFVDFLSDEIK\n", "software: OpenSWATH, -------- TransitionGroupFeature --------\n", "leftBoundary: 6236.052702\n", "rightBoundary: 6248.63205\n", "areaIntensity: 1137201.5\n", "consensusApex: 6241.44882\n", "consensusApexIntensity: None\n", "qvalue: 7.964108227e-05\n", "consensusApexIM: 0.9800000191\n", "precursor_mz: None\n", "precursor_charge: 2\n", "product_annotations: None\n", "product_mz: None\n", "sequence: AFVDFLSDEIK\n", "software: DIA-NN]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features = loader.loadTopTransitionGroupFeature(\"AFVDFLSDEIK\", 2)\n", "features" ] }, { "cell_type": "markdown", "id": "e5a3c6a1-0905-4e93-86dd-8112b0248f29", "metadata": {}, "source": [ "This method returns a dictionary where the keys are the runnames and the value is a list of TransitionGroupFeatures. We can see which software this feature was found in by the \"software\" tag." ] }, { "cell_type": "markdown", "id": "c7bdfb24-981d-4c3a-94f1-2a331c4e9df3", "metadata": {}, "source": [ "Here, we can see that both `OpenSwath` and `DIA-NN` are detecting the same feature since the left and right boundaries and consensusApex are approximately equal. The intensities are different due to the different strategies that `OpenSWATH` and `DIA-NN` use to compute intensity. `OpenSWATH` sums up the intensity across all fragments while `DIA-NN` sums up the intensity across the top 3 fragment ions. " ] }, { "cell_type": "code", "execution_count": 6, "id": "0bfaa924-653a-44a4-9fea-9142c924128d", "metadata": { "editable": true, "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "3079256.37492" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Proof Intensities are actually similar\n", "import pandas as pd\n", "df = pd.read_csv(\"ionMobilityTest2/ionMobilityTest2-diannReport.tsv\", sep='\\t')\n", "sum([ float(i) for i in df['Fragment.Quant.Raw'].iloc[1].split(';')[:-1] ])" ] }, { "cell_type": "markdown", "id": "831a5e9d-eb7e-490f-bf0b-86dcf7ccc857", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Loading The Top Transition Group Features as a Pandas DataFrame" ] }, { "cell_type": "raw", "id": "666a9577-bc0a-4933-b910-776c9957426d", "metadata": { "editable": true, "jp-MarkdownHeadingCollapsed": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "For easier data manipulation, MassDash can also load feature information directly into a pandas dataframe using the :py:func:`~loaders.GenericLoader.loadTopTransitionGroupFeatureDf` method. However, this limits the usage with downstream MassDash tools." ] }, { "cell_type": "code", "execution_count": 8, "id": "b5297af2-8399-400f-866e-d02debb78bb3", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
runNameleftBoundaryrightBoundaryareaIntensityqvalueconsensusApexconsensusApexIntensitysequenceprecursor_chargesoftwareconsensusApexIM
0ionMobilityTest6235.8486336248.4282232848190.00.0000356242.15000352642.16135AFVDFLSDEIK2OpenSWATHNaN
1ionMobilityTest6236.0527026248.6320501137201.50.0000806241.44882NaNAFVDFLSDEIK2DIA-NN0.98
\n", "
" ], "text/plain": [ " runName leftBoundary rightBoundary areaIntensity qvalue \\\n", "0 ionMobilityTest 6235.848633 6248.428223 2848190.0 0.000035 \n", "1 ionMobilityTest 6236.052702 6248.632050 1137201.5 0.000080 \n", "\n", " consensusApex consensusApexIntensity sequence precursor_charge \\\n", "0 6242.15000 352642.16135 AFVDFLSDEIK 2 \n", "1 6241.44882 NaN AFVDFLSDEIK 2 \n", "\n", " software consensusApexIM \n", "0 OpenSWATH NaN \n", "1 DIA-NN 0.98 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loader.loadTopTransitionGroupFeatureDf(\"AFVDFLSDEIK\", 2)" ] }, { "cell_type": "markdown", "id": "270d5109-eff2-4175-8059-b1075cb9f284", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Loading All TransitionGroupFeatures" ] }, { "cell_type": "raw", "id": "2d253d2b-fed0-44ac-90ff-107379d1b661", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "By default, `OpenSWATH` computes several features for a target peptide and the top feature is determined by `PyProphet` to load all of the features identified by `OpenSwath` for a particular precursor, we can use the:py:func:`~loaders.GenericLoader.loadTransitionGroupFeatures` method. " ] }, { "cell_type": "code", "execution_count": 9, "id": "f511c91d-f088-488c-88ff-5c5f56e1383d", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "TransitionGroupFeatureCollection\n", "ionMobilityTest: [-------- TransitionGroupFeature --------\n", "leftBoundary: 6235.8486328125\n", "rightBoundary: 6248.42822265625\n", "areaIntensity: 2848190.0\n", "consensusApex: 6242.15\n", "consensusApexIntensity: 352642.16135025\n", "qvalue: 3.5084067486223456e-05\n", "consensusApexIM: 0.978579389257473\n", "precursor_mz: None\n", "precursor_charge: 2\n", "product_annotations: None\n", "product_mz: None\n", "sequence: AFVDFLSDEIK\n", "software: OpenSWATH, -------- TransitionGroupFeature --------\n", "leftBoundary: 6255.64599609375\n", "rightBoundary: 6266.51513671875\n", "areaIntensity: 35433.9\n", "consensusApex: 6256.67\n", "consensusApexIntensity: 2888.98703575134\n", "qvalue: 0.0001776217262565\n", "consensusApexIM: 0.981810896830182\n", "precursor_mz: None\n", "precursor_charge: 2\n", "product_annotations: None\n", "product_mz: None\n", "sequence: AFVDFLSDEIK\n", "software: OpenSWATH, -------- TransitionGroupFeature --------\n", "leftBoundary: 6236.052702\n", "rightBoundary: 6248.63205\n", "areaIntensity: 1137201.5\n", "consensusApex: 6241.44882\n", "consensusApexIntensity: None\n", "qvalue: 7.964108227e-05\n", "consensusApexIM: 0.9800000191\n", "precursor_mz: None\n", "precursor_charge: 2\n", "product_annotations: None\n", "product_mz: None\n", "sequence: AFVDFLSDEIK\n", "software: DIA-NN]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loader.loadTransitionGroupFeatures(\"AFVDFLSDEIK\", 2)" ] }, { "cell_type": "markdown", "id": "b7cfe32b-33ba-472a-b745-9e4fa5c83e81", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", "\n", "**Note:**\n", "\n", "DIA-NN only outputs one feature per precursor so calling the `loadTransitionGroupFeatures()` method will output esentially the same as `loadTopTransitionGroupFeature()` (only difference is that `loadTransitionGroupFeatures()` outputs a list of `TransitionGroupFeatures`) \n", "\n", "
" ] }, { "cell_type": "raw", "id": "34bbc9e8-1082-4a42-bfc4-61ceecf52e06", "metadata": {}, "source": [ "This method returns a :py:class:`~structs.TransitionGroupFeatureCollection` which is a dictionary where the keys are the run names and the values are a list of all the :py:class:`~structs.TransitionGroupFeature` objects found. Here we can see that `OpenSWATH` finds an additional feature however in general this should be ignore because the top :py:class:`~structs.TransitionGroupFeature` has a lower `q-value` " ] }, { "cell_type": "markdown", "id": "b18e520d-272b-4ce0-b65a-5c7598db1d95", "metadata": {}, "source": [ "## Loading All TransitionGroupFeatures In a Pandas DataFrame" ] }, { "cell_type": "raw", "id": "f1a36f45-45e7-4222-9d83-a771e4c5053a", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "To load all :py:class:`~structs.TransitionGroupFeature` objects in a pandas dataframe, the :py:func:`~loaders.GenericLoader.loadTopTransitionGroupFeaturesDf` method can be used. " ] }, { "cell_type": "code", "execution_count": 10, "id": "4cde28b7-f917-417d-b0e8-e4d701c59c1a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
runnameleftBoundaryrightBoundaryareaIntensityqvalueconsensusApexconsensusApexIntensityprecursor_chargesequencesoftwareconsensusApexIM
0ionMobilityTest6235.8486336248.4282232848190.00.0000356242.15000352642.1613502AFVDFLSDEIKOpenSWATH0.978579
1ionMobilityTest6255.6459966266.51513735433.90.0001786256.670002888.9870362AFVDFLSDEIKOpenSWATH0.981811
2ionMobilityTest6236.0527026248.6320501137201.50.0000806241.44882NaN2AFVDFLSDEIKDIA-NN0.980000
\n", "
" ], "text/plain": [ " runname leftBoundary rightBoundary areaIntensity qvalue \\\n", "0 ionMobilityTest 6235.848633 6248.428223 2848190.0 0.000035 \n", "1 ionMobilityTest 6255.645996 6266.515137 35433.9 0.000178 \n", "2 ionMobilityTest 6236.052702 6248.632050 1137201.5 0.000080 \n", "\n", " consensusApex consensusApexIntensity precursor_charge sequence \\\n", "0 6242.15000 352642.161350 2 AFVDFLSDEIK \n", "1 6256.67000 2888.987036 2 AFVDFLSDEIK \n", "2 6241.44882 NaN 2 AFVDFLSDEIK \n", "\n", " software consensusApexIM \n", "0 OpenSWATH 0.978579 \n", "1 OpenSWATH 0.981811 \n", "2 DIA-NN 0.980000 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features_df = loader.loadTransitionGroupFeaturesDf(\"AFVDFLSDEIK\", 2)\n", "features_df" ] }, { "cell_type": "markdown", "id": "e8ef2e2b-12bf-4c3a-a0de-708a9f7e72ff", "metadata": {}, "source": [ "Althuogh the Pandas DataFrame output is incompatible with further *Masseer* analysis, the pandas dataframe allows for greater flexibity with alternative analysis. For example, we can calculate the peakWidth of all features as shown below." ] }, { "cell_type": "code", "execution_count": 11, "id": "d8e60660-c069-4c15-a532-d869eeee4d84", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
runnameleftBoundaryrightBoundaryareaIntensityqvalueconsensusApexconsensusApexIntensityprecursor_chargesequencesoftwareconsensusApexIMpeakWidth
0ionMobilityTest6235.8486336248.4282232848190.00.0000356242.15000352642.1613502AFVDFLSDEIKOpenSWATH0.97857912.579590
1ionMobilityTest6255.6459966266.51513735433.90.0001786256.670002888.9870362AFVDFLSDEIKOpenSWATH0.98181110.869141
2ionMobilityTest6236.0527026248.6320501137201.50.0000806241.44882NaN2AFVDFLSDEIKDIA-NN0.98000012.579348
\n", "
" ], "text/plain": [ " runname leftBoundary rightBoundary areaIntensity qvalue \\\n", "0 ionMobilityTest 6235.848633 6248.428223 2848190.0 0.000035 \n", "1 ionMobilityTest 6255.645996 6266.515137 35433.9 0.000178 \n", "2 ionMobilityTest 6236.052702 6248.632050 1137201.5 0.000080 \n", "\n", " consensusApex consensusApexIntensity precursor_charge sequence \\\n", "0 6242.15000 352642.161350 2 AFVDFLSDEIK \n", "1 6256.67000 2888.987036 2 AFVDFLSDEIK \n", "2 6241.44882 NaN 2 AFVDFLSDEIK \n", "\n", " software consensusApexIM peakWidth \n", "0 OpenSWATH 0.978579 12.579590 \n", "1 OpenSWATH 0.981811 10.869141 \n", "2 DIA-NN 0.980000 12.579348 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features_df['peakWidth'] = features_df['rightBoundary'] - features_df['leftBoundary']\n", "features_df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.10" } }, "nbformat": 4, "nbformat_minor": 5 }